Welcome to DiAna, your go-to R package for performing disproportionality analysis on the FDA Adverse Event Reporting System (FAERS). DiAna simplifies the process of importing cleaned FAERS data, retrieving cases of interest, conducting descriptive analysis, and performing disproportionality analysis. Whether you’re a novice or an expert in pharmacovigilance, DiAna is designed to make your analyses easy and efficient.
Before you can start using DiAna, you need to have R and R Studio installed on your system. If you haven’t installed them yet, follow these simple steps:
Installing R Visit the official R project website (https://cran.r-project.org) and download the appropriate version of R for your operating system (Windows, macOS, or Linux). Run the installer and follow the instructions to complete the installation. R provides a powerful and flexible environment for statistical computing and graphics, forming the foundation for your data analysis with DiAna.
Installing R Studio R Studio is an integrated development environment (IDE) for R that makes your R programming easier and more efficient. Once R is installed, go to the R Studio website (https://posit.co/download/rstudio-desktop/) and download the free version of R Studio Desktop. Install R Studio by following the installation instructions for your operating system.
With R and R Studio installed, you have a comprehensive and user-friendly environment ready for conducting pharmacovigilance analyses using DiAna.
For an organized and efficient workflow, we recommend creating a dedicated project for your DiAna analyses. Follow these steps to set up your DiAna project in R Studio:
Open R Studio: Start by opening R Studio on your computer. If you’ve just installed R Studio, you can find it in your applications or programs menu.
Create a New Project: In R Studio, click on File in the upper left corner, then select New Project. Choose New Directory and then New Project. Name your project (e.g., “DiAna”) and specify a location on your Desktop or any preferred directory.
Install the DiAna package: Begin by installing the DiAna package from GitHub writing and running the following lines in the console:
It may be useful to run this command from time to time to download the latest update to the package (e.g., bugs solved, new or improved functions)
With library DiAna we have imported the DiAna package (i.e., the toolbox with all the functions that we will use in our analyses). With setup_DiAna(quarter=“23Q1”) we are automaticatilly setting up the project: it will create a folder to store cleaned FAERS data, that will be downloaded from an OSF repository (in particular we are downloading the FAERS updated to the 23Q1 quarter). The entire cleaning process is made transparent on the github (https://github.com/fusarolimichele/DiAna). It will also create a folder for external sources and a folder to store projects. In the external sources, it will also download the DiAna dictionary used to translate free text drug names into active ingredients, a linkage to the ATC code, and other useful data sources.
As you can see some external sources are not available for download because they require subscription (e.g., MedDRA).
Now that you have R and R Studio installed, you have installed the DiAna package, and you have set up your project, you are all set to begin your journey with DiAna.
With DiAna the main project, we now want to create a subproject specific for this tutorial. The first thing we do is open a new R script (the white paper with the green and white cross on the top left corner of rstudio) and insert some details on the project. As specified before, every time we use a # we are inserting a comment. Comments are extremely useful to document and explain our project, but are not run and do not affect the results.
# Information -----------------------------------------------------------------
## Project title --------------------------------------------------------------
### Tutorial on the Basic usage of DiAna
## Data -----------------------------------------------------------------------
### FDA Adverse Event Reporting System Quarterly Data up to 23Q1
## Authors --------------------------------------------------------------------
### Michele Fusaroli
## Version --------------------------------------------------------------------
### Set up: 2023-10-08### Last update: 2023-10-08In this section, we guide you through setting up a subproject within
your DiAna analysis environment. The following R code snippet creates a
new directory named “tutorial” within the “projects” folder of your
DiAna package. It specifies the version of the FAERS dataset to be used
(in this case, “23Q1”). The here::here()
function helps locate the current DiAna package directory, and the
project_path variable is defined to point to the “tutorial” folder
within your DiAna package.
# Set up ----------------------------------------------------------------------
dir.create(paste0(here::here(), "/projects/tutorial"), recursive = TRUE)
#> Warning in dir.create(paste0(here::here(), "/projects/tutorial"), recursive =
#> TRUE): '/Users/michele.fusaroli/Desktop/DiAna_package/DiAna/projects/tutorial'
#> already exists
FAERS_version <- "23Q1"
DiAna_path <- here::here()
project_path <- paste0(DiAna_path, "/projects/tutorial/")By running this code, you establish a structured subproject environment. The “tutorial” folder will contain the necessary files and configurations for your subproject, ensuring a seamless and organized analysis experience.
Finally, we save the script in the subproject folder, using the blue floppy icon above the script.
In the DiAna package, seamless data import is a crucial aspect of empowering pharmacovigilance analyses. The first code chunk utilizes the import() function to load essential datasets from the FDA Adverse Event Reporting System (FAERS). By importing datasets like “DRUG,” “REAC,” “DEMO,” and “INDI,” DiAna equips users with comprehensive information about drugs, adverse reactions, patient demographics, and indications.
## Packages -------------------------------------------------------------------
library(DiAna)
## Input FAERS ---------------------------------------------------------------
import("DRUG")
import("REAC")
import("DEMO")
import("INDI")
# Try also importing and look at the information included in DEMO_SUPP, OUTC, THER, DOSES, DRUG_SUPP, DRUG_NAME| primaryid | sex | age_in_days | wt_in_kgs | occr_country | event_dt | occp_cod | reporter_country | rept_cod | init_fda_dt | fda_dt | premarketing | literature | RB_duplicates | RB_duplicates_only_susp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 883619455 | F | NA | 117 | NA | 20120101 | HP | Canada | EXP | 20121011 | 20230331 | FALSE | FALSE | FALSE | FALSE |
| 89418103 | M | 4015 | NA | France, French Republic | NA | HP | France, French Republic | EXP | 20121203 | 20230331 | FALSE | TRUE | FALSE | FALSE |
| 89516744 | M | 4015 | NA | France, French Republic | NA | HP | France, French Republic | EXP | 20121204 | 20230331 | FALSE | TRUE | FALSE | FALSE |
| 89516903 | M | 2920 | NA | France, French Republic | NA | HP | France, French Republic | EXP | 20121204 | 20230331 | FALSE | TRUE | FALSE | FALSE |
| 90224005 | M | NA | NA | Canada | 20020201 | MD | Canada | EXP | 20130118 | 20230331 | FALSE | FALSE | FALSE | FALSE |
| 91610302 | F | 7665 | NA | United States of America | NA | HP | United States of America | PER | 20130313 | 20230331 | FALSE | TRUE | FALSE | FALSE |
| primaryid | pt | drug_rec_act |
|---|---|---|
| 998834879 | weight increased | NA |
| 998834879 | joint noise | NA |
| 99953304 | drug interaction | NA |
| 99953304 | breast cancer | NA |
| 99974963 | breast cancer | NA |
| 99974963 | drug interaction | NA |
| primaryid | drug_seq | substance | role_cod |
|---|---|---|---|
| 99974963 | 5 | dexamethasone | I |
| 99974963 | 6 | ibuprofen | I |
| 99974963 | 7 | lorazepam | I |
| 99974963 | 8 | NA | I |
| 99974963 | 9 | paracetamol | I |
| 99974963 | 10 | zolmitriptan | SS |
| primaryid | drug_seq | indi_pt |
|---|---|---|
| 99974963 | 5 | product used for unknown indication |
| 99974963 | 6 | product used for unknown indication |
| 99974963 | 7 | product used for unknown indication |
| 99974963 | 8 | product used for unknown indication |
| 99974963 | 9 | product used for unknown indication |
| 99974963 | 10 | migraine |
Moreover, DiAna offers functionality to import data related to the Anatomical Therapeutic Chemical (ATC) classification linked to the active ingredients as translated by the DiAna dictionary. This ATC file is a crucial component for grouping drugs based on their therapeutic use. Please note that we cannot provide the Medical Dictionary for Regulatory Activities (MedDRA), which is only upon subscription, and therefore the import_MedDRA function will not work for you. If you have access to MedDRA, you can follow the instructions on the GitHub repository to create the MedDRA file for the import_MedDRA function.
import_ATC()
# we cannot make the meddra available because it is only upon subscription, but here we usually also use the import_MedDRA()) function for grouping events| substance | code | primary_code | Lvl4 | Class4 | Lvl3 | Class3 | Lvl2 | Class2 | Lvl1 | Class1 |
|---|---|---|---|---|---|---|---|---|---|---|
| antacids, unspecified | A02A | A02A | A02A | NA | A02A | Antacids | A02 | Drugs for acid related disorders | A | Alimentary and Metabolism |
| sodium perborate | A01AB19 | A01AB19 | A01AB | Antiinfectives and antiseptics for local oral treatment | A01A | Stomatological preparations | A01 | Stomatological preparations | A | Alimentary and Metabolism |
| domiphen | A01AB06 | A01AB06 | A01AB | Antiinfectives and antiseptics for local oral treatment | A01A | Stomatological preparations | A01 | Stomatological preparations | A | Alimentary and Metabolism |
| sodium monofluorophosphate | A01AA02 | A01AA02 | A01AA | Caries prophylactic agents | A01A | Stomatological preparations | A01 | Stomatological preparations | A | Alimentary and Metabolism |
| sodium fluoride | A01AA01 | A01AA01 | A01AA | Caries prophylactic agents | A01A | Stomatological preparations | A01 | Stomatological preparations | A | Alimentary and Metabolism |
| stomatological preparations, unspecified | A01A | A01A | A01A | NA | A01A | Stomatological preparations | A01 | Stomatological preparations | A | Alimentary and Metabolism |
In this code chunk, we define the specific parameters for our
analysis in DiAna. drugs_selected is set
to “haloperidol,” representing the drug of interest, and
events_selected is set to “pneumonia,”
indicating the adverse event under investigation. These parameters play
a pivotal role in guiding the disproportionality analysis. By specifying
the drug and adverse event of focus, users can narrow down their
analysis to a particular scenario, allowing for a more targeted and
insightful exploration of adverse event patterns related to the selected
drug. The flexibility to customize these and other parameters (e.g.,
population), as we will see in the following paragraphs, enhances the
precision of the analysis, enabling users to derive meaningful
conclusions about drug safety within the pharmacovigilance
framework.
Note that while events are coded to MedDRA, drugs are submitted to the FAERS as free text and need a standardization. We implemented a FAERS-specific DiAna dictionary (https://doi.org/10.1101/2023.06.07.23291076). The get_drugnames function allows to retrieve the drugnames that were translated to the drug of interest.
#> Warning in instance$preRenderHook(instance): It seems your data is too big for
#> client-side DataTables. You may consider server-side processing:
#> https://rstudio.github.io/DT/server.html
In this code segment, we delve into the process of cases
identification and retrieval within the DiAna package. Firstly, reports
associated with the specified drug of interest, denoted as
drugs_selected, are identified using the
Drug dataset. Similarly, adverse event reports related to the chosen
event (events_selected) are extracted from
the Reac dataset. The intersection of these two sets of primary IDs
(pids_drugs and
pids_events) identifies cases where both
the drug and the adverse event are reported. These cases are crucial for
in-depth individual assessment and analysis.
# we identify the reports with the drug
pids_drugs <- unique(Drug[substance %in% drugs_selected]$primaryid)
# we identify the reports with the event
pids_events <- unique(Reac[pt %in% events_selected]$primaryid)
# we identify cases as reports with both the drug and the event
pids_cases <- intersect(pids_drugs, pids_events)To facilitate further scrutiny and analysis, DiAna offers the
retrieve() function, which gathers the
information about these identified cases from the different FAERS
dataset and stores it into two Excel files: one with a row for each
case, one with multiple rows for each case recording more detailed drug
information. The file_name parameter,
specified as
paste0(project_path, "individual_cases"),
ensures that the retrieved cases are stored within the designated
subproject folder.
# retrieve the cases into an excel for individual assessment
retrieve(pids_cases, file_name = paste0(project_path, "individual_cases"))This step is pivotal, allowing pharmacovigilance experts to conduct detailed examinations of individual cases, thereby enhancing the precision and depth of the analysis. DiAna’s seamless case retrieval mechanism simplifies the process, empowering users to focus on in-depth assessments of specific adverse event patterns associated with the chosen drug. In particular, it allows to identify duplicates not detected with the implemented algorithms and identify potential alternative causes and risk factors.
Aggregated statistics for this data can also be obtained using a descriptive function. This function can be used in multiple ways, even comparing the cases of interest with an appropriate reference group (RG). We could, for example, check whether individuals taking haloperidol and developing pneumonia are different from individuals taking haloperidol and developing other adverse events (as a proxy of the real population of haloperidol recipients). This may allow to identify potential confounders and risk factors. It is important to note that even if tests are performed and p-values are obtained, they only help in the hypothesis generation algorithm and don’t really identify a significant difference with the real population.
# aggregated statistics
descriptive(pids_cases, RG = pids_drugs, drug = drugs_selected, file_name = paste0(project_path, "descriptive.xlsx"))Using different functions we can also identify concomitants (drugs), comorbidities (indications) and cooccurrent reactions at any level or in a hierarchycal structure.
# we describe cooccurrences
head(reporting_rates(pids_cases, entity = "reaction"))
#> pt label_pt N_pt
#> 1: pneumonia pneumonia (100%) [991] 991
#> 2: respiratory failure respiratory failure (11.5%) [114] 114
#> 3: sepsis sepsis (11.4%) [113] 113
#> 4: dyspnoea dyspnoea (11.1%) [110] 110
#> 5: pyrexia pyrexia (11%) [109] 109
#> 6: pulmonary embolism pulmonary embolism (9.08%) [90] 90
# we describe indications for haloperidol
head(reporting_rates(pids_cases, entity = "indication", drug_indi = "haloperidol"))
#> pt label_pt N_pt
#> 1: schizophrenia schizophrenia (21.69%) [77] 77
#> 2: agitation agitation (10.99%) [39] 39
#> 3: psychotic disorder psychotic disorder (9.3%) [33] 33
#> 4: nausea nausea (7.61%) [27] 27
#> 5: delirium delirium (5.35%) [19] 19
#> 6: anxiety anxiety (5.35%) [19] 19
# we describe concomitant suspected of the event according to the ATC classification
hierarchycal_rates(pids_cases, "substance", drug_role = c("PS", "SS"))
DT::datatable(readxl::read_xlsx("/Users/michele.fusaroli/Desktop/DiAna_package/DiAna/projects/tutorial/reporting_rates.xlsx"))We can also use a network to visualize the syndromic co-reporting of events:
Let’s use a different example to explore the disproportionality_analysis function. We are investigating 5 manfestations of pathologic impulsivity (impulse control disorders: “gambling”, “hypersexuality”, “shopping”, “hyperphagia”, “kleptomania”) together with one known adverse drug reaction of aripiprazole (“headache”) and one clear bias (“bipolar disorder” is one of the major indications for using aripiprazole).
drug_selected <- "aripiprazole" # define the drug of interest
reac_selected <- list( # define the events of interest
"gambling disorder", "hypersexuality", "compulsive shopping", "hyperphagia", "kleptomania", "bipolar disorder", "headache"
)We perform a disproportionality analysis using the disproportionality_analysis() function to see whether each of the events investigated is more reported with the drug of interest than what expected based on the entire FAERS database assuming that the drug and the event are independent.And we visualize the information component, a measure of disproportionality, using a forest plot.
DPA_df <- disproportionality_analysis( # perform disproportionality analysis
drug_selected, reac_selected
)
render_forest(DPA_df, "IC", row = "event")We found an association between aripiprazole and all the events investigated apart from headache. Sharing this script, we allow everyone to replicate the study, to use this study design for their own similar inquiries, and to make as suggestions to improve the quality of our evidence, for example taking care of some important biases.
Duplicates may distort our results. DiAna already applied multiple strategies to detect potential duplicates, and will soon add even more advanced algorithms. The results of these duplicates detection is stored in the DEMO dataset.We are using now the RB_duplicates_only_susp algorithm, that looks for the same exact values in sex, age, country of occurrence, event date, list of events, and list of suspected drugs. This way we remove more than 3 milion reports over 16 total reports in the FAERS (quarterly data) up to 23Q1.Deduplicated reports are selected as those that are not identified as duplicates by the algorithm.Then we perform the disproportionality analysis restricting to deduplicated reports. And we store these results together with the results of the previous analysis to allow for visual comparison using a forest plot.
deduplicated <- Demo[RB_duplicates_only_susp == FALSE]$primaryid # identify deduplicated reports
DPA_df_deduplicated <- disproportionality_analysis( # disproportionality analysis with restriction
drug_selected, reac_selected,
restriction = deduplicated
)
df <- rbindlist(list( # storing together the results of different analyses
DPA_df[, nested := "main"],
DPA_df_deduplicated[, nested := "deduplicated"]
), fill = TRUE)
render_forest(df, "IC", # compare the results using a forest plot
nested = "nested", row = "event"
)In these cases, in fact, we don’t see much a difference comparing the analysis on the entire FAERS with the analysis on deduplicated data. This is not generalizable to other drugs and events, and to other deduplication algorithms.
Since bipolar disorder is a reason for using aripiprazole, we could think that patients with bipolar disorder are more susceptible to impulsivity and impulsive behaviors, and maybe that could be the only reason for the association between aripiprazole and impulsive behaviors (what is known as confounding by indication).The way we can mitigate this bias is by conditioning on the indication, for example restricting to patients with bipolar disorder alone.We therefore need to import the Indi dataset.To be more sensitive the search for bipolar disorder, we want to use the MedDRA hierarchy. Alas, the access to MedDRA is by subscription only and we cannot provide our ready-to-use file. But with a prescription you can follow the algorithm deascribed in the https://github.com/fusarolimichele/DiAna to make your own DiAna-compatible MedDRA.Thus, we can select only reports with, among indications, terms included in the hlgt “manic and bipolar mood disorders and disturbances”. And we repeat again the disproportionality analysis with a different restriction, and compare the results using the forest plot.
import_MedDRA() # import MedDRA hierarchy (available only upon subscription to MedDRA MSSO)
#> def soc
#> 1: cong congenital, familial and genetic disorders
#> 2: inv investigations
#> 3: inv investigations
#> 4: inv investigations
#> 5: inv investigations
#> ---
#> 25408: inj&p injury, poisoning and procedural complications
#> 25409: preg pregnancy, puerperium and perinatal conditions
#> 25410: surg surgical and medical procedures
#> 25411: surg surgical and medical procedures
#> 25412: blood blood and lymphatic system disorders
#> hlgt
#> 1: metabolic and nutritional disorders congenital
#> 2: endocrine investigations (incl sex hormones)
#> 3: endocrine investigations (incl sex hormones)
#> 4: endocrine investigations (incl sex hormones)
#> 5: endocrine investigations (incl sex hormones)
#> ---
#> 25408: procedural related injuries and complications nec
#> 25409: pregnancy, labour, delivery and postpartum conditions
#> 25410: obstetric and gynaecological therapeutic procedures
#> 25411: male genital tract therapeutic procedures
#> 25412: white blood cell disorders
#> hlt
#> 1: inborn errors of steroid synthesis
#> 2: reproductive hormone analyses
#> 3: reproductive hormone analyses
#> 4: reproductive hormone analyses
#> 5: reproductive hormone analyses
#> ---
#> 25408: non-site specific procedural complications
#> 25409: normal pregnancy, labour and delivery
#> 25410: cervix therapeutic procedures
#> 25411: male genital tract therapeutic procedures nec
#> 25412: eosinophilic disorders
#> pt
#> 1: 11-beta-hydroxylase deficiency
#> 2: 17 ketosteroids urine
#> 3: 17 ketosteroids urine decreased
#> 4: 17 ketosteroids urine increased
#> 5: 17 ketosteroids urine normal
#> ---
#> 25408: incision site skin puckering
#> 25409: spontaneous rupture of membranes
#> 25410: cervix stent placement
#> 25411: sperm cryopreservation
#> 25412: paraneoplastic eosinophilia
bipolar_disorder <- Indi[indi_pt %in% MedDRA[ # select reports with bipolar disorder among indications
hlgt == "manic and bipolar mood disorders and disturbances"
]$pt]$primaryid
## if MedDRA not available you can use isntead:
# bipolar_disorder <- Indi[indi_pt =="bipolar disorder]$primaryid
DPA_df_bipolar <- disproportionality_analysis( # disproportionality analysis with restriction
drug_selected, reac_selected,
restriction = bipolar_disorder
)
df <- rbindlist(list( # storing together the results of different analyses
df, DPA_df_bipolar[, nested := "bipolar"]
), fill = TRUE)
render_forest(df, "IC", # compare the results using a forest plot
nested = "nested", row = "event"
)Conditioning on the indication, we see that the association with bipolar disorder disappears (as it should be). Also the association with impulse control disorders is weakened, supporting the hypothesis that bipolar disorder is a risk factor for impulsivity but it is not sufficient alone to completely explain these pervasive behaviors. The confidence interval of kleptomania gets wider, since the number of cases gets too small, and the signal disappear.We can also see that the signal for headache, which is not associated with bipolar disorder, is not affected by this restriction.
The FDA warning in 2016 may have inflated the reporting distorting the results (i.e., notoriety bias). To mitigate this bias we can select only reports submitted before the date of the warning, and repeating the disproportionality analysis.
warning_date <- 20160305 # define the warning date
pre_warning <- Demo[init_fda_dt < warning_date]$primaryid # select only reports submitted before the warning
preW <- disproportionality_analysis( # disproportionality analysis with restriction
drug_selected, reac_selected,
restriction = pre_warning
)
df <- rbindlist(list( # storing together the results of different analyses
df, preW[, nested := "pre_warning"]
), fill = TRUE)
render_forest(df, "IC", # compare the results using a forest plot
nested = "nested", row = "event"
)Here you see in green the results of the analysis restricting to the pre-warning. You can see that this analysis does not affect bipolar disorder nor headache, but it affects all the impulse control disorders apart gambling. Plausibly the FDA warning was based mainly on suspect aripiprazole-induced gambling disorder.
MedDRA, used to code adverse events, is highly redundant: there are multiple terms that may be used to express the same concept. Therefore a more sensitive and specific approach should take into account all these terms.We can therefore redefine our reactions of interest for higher sensitivity.
reac_selected <- list( # redefined reaction for higher sensitivity
"gambling disorder" = list("gambling disorder", "gambling"),
"hypersexuality" = list("compulsive sexual behaviour", "hypersexuality", "excessive masturbation", "excessive sexual fantasies", "libido increased", "sexual activity increased", "kluver-bucy syndrome", "erotophonophilia", "exhibitionism", "fetishism", "frotteurism", "masochism", "paraphilia", "paedophilia", "sadism", "transvestism", "voyeurism", "sexually inappropriate behaviour"),
"compulsive shopping" = list("compulsive shopping"),
"hyperphagia" = list("binge eating", "food craving", "hyperphagia", "increased appetite"),
"kleptomania" = list("kleptomania", "shoplifting"),
"bipolar disorder" = as.list(MedDRA[hlt == "bipolar disorders"]$pt),
"headache" = as.list(MedDRA[hlgt == "headaches"]$pt)
)
custom_group <- disproportionality_analysis( # perform the disproportionality analysis
drug_selected, reac_selected,
meddra_level = "custom"
)
df <- rbindlist(list( # storing together the results of different analyses
df, custom_group[, nested := "custom-groups"]
), fill = TRUE)
render_forest(df, "IC", # compare the results using a forest plot
nested = "nested", row = "event"
)As we can see here custom groups affect mainly kleptomania. Indeed many reporters prefer to use shoplifting instead of kleptomania, and therefore the signal is now much stronger.# Complete analysisIn conclusion, we can integrate all the restrictions and use custom groups to have a unified analysis.
restriction <- intersect(intersect(deduplicated, bipolar_disorder), pre_warning) # integrate all the restrictions
complete <- disproportionality_analysis( # disproportionality
drug_selected, reac_selected,
meddra_level = "custom", restriction = restriction
)
render_forest(complete, "IC", # forest plot
row = "event"
)Therefore, you can see that even before the warning and taking into account all the considered biases, the FDA had enough information to publish the warning.
Congratulations! You have successfully conducted basic disproportionality analysis using DiAna. This vignette covered the installation of the package, downloading necessary files, and the basic usage of its functionalities. DiAna empowers the pharmacovigilance community to make informed decisions and contribute to the collective knowledge of drug safety.
For more advanced analyses and detailed explanations, explore our advanced vignettes and documentation (in progress). DiAna is here to simplify your pharmacovigilance journey and foster collaboration within the community.